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Abstract 

The design, implementation and analysis of a free library for boundary 
element calculations is presented. The library is free in the sense of the GNU 
General Public Licence and is intended to allow users to solve a wide range of 
problems using the boundary element method. The library incorporates a fast 
multipole method which is tailored to boundary elements of order higher than 
zero, taking account of the finite extent of the elements in the generation of the 
domain tree. The method is tested on a sphere and on a cube, to test its ability 
to handle sharp edges, and is found to be accurate, efficient and convergent. 

1 INTRODUCTION 

The boundary element method has become accepted as a standard computational 
technique for many problems and, when combined with a fast multipole method, it 
has proven capable of solving very large problems, primarily in scattering calculations 
for acoustic and electromagnetic applications. This paper describes bem3d, a free 
library released under the GNU General Public Licence, for the solution of general 
problems using the boundary element method, incorporating a fast multipole method 
which is tailored to boundary element problems using higher-order elements. 

Free software has become an important part of the resources available to those 
working in scientific and computational fields over the last few years. Standard 
codes and libraries for many of the operations routinely carried out in computation 
are now freely available and, because their source code is openly published, they can 
be modified and improved by experts in the field. Because codes are distributed as 
libraries for a specific purpose, it is possible to select modules which can be combined 
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in codes for particular applications. This avoids the problem of reinventing the wheel, 
speeding code development and improving reliability. 

The library described in this paper makes use of a number of existing free li- 
braries [1-3] for computational applications, as well as a standard library of porta- 
bility and utility functions. The existing software is extended by adding functions 
for the handling of boundary elements, for fast multipole calculations and some sup- 
port for parallel systems. The intention is that other users will be able to make use 
of bem3d to solve their own problems by adding a Green's function for the partial 
differential equation in question. This paper describes the design choices which have 
been made in bem3d and the implementation of certain features including a novel 
adaptive fast multipole method for boundary elements. 

2 BASIC CODE STRUCTURE 

The boundary element library, bem3d, has been written with a number of goals in 
mind. The first of these is generality: users should be able to solve different physical 
problems and implement different solution techniques and elements as required. The 
second is that existing wheels should be used rather than being reinvented: the 
codes use other, high-quality, free libraries for various aspects of the work. These are 
the GNU Triangulated Surface library (GTS) for geometry handling [1], the GNU 
Scientific Library [2], LAPACK [3] in the iterative solver and the GLIB library for 
portability functions and special data structures, such as the tree used in the fast 
multipole method. The GMSH suite of meshing and visualization tools [4] is used to 
generate geometries and examine results. Using existing free software improves code 
reliability and gives users an automatic upgrade as these codes are improved. 

The library and its programs have been written in C, partly because C is nearly 
universally supported across the range of computing platforms, partly because the 
other libraries used have C interfaces, and also because C allows the use of pro- 
gramming techniques which make extension and customization easier, for example, 
through the use of loadable modules, an approach which is supported by the GLIB 
library. The resulting code has been written as a number of separate libraries to 
facilitate code re-use and to encourage generality in the coding. 

2.1 Problem definition 

For computational purposes, the partial differential equation for a problem is defined 
by its Green's function. Boundary integral equations have been applied in many areas 
of engineering and science and here we concentrate on the Laplace and Helmholtz 
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potential problems which have applications in fluid dynamics [5] and in acoustic 
scattering [6, 7]. The form of the integral equation is identical in each case, with only 
the Green's function being different. For an unbounded domain containing surface(s) 
S with outward pointing normal n, the potential 6 is given by: 

c ^=L G Z-^ iS - c -^b (ia) 

f 86 8G d kR 

C(xWx W 5 G ^-^T dS - G ^STr < lb » 

where k = uj/c is the wavenumber of the Helmholtz problem, R = |x — xi| and 
subscript 1 denotes variables of integration. The constant C depends on the field 
point position x: 

x inside S; 

C(x) = { 1 x outside S; (2) 

x on S 

with G = l/4:nR. When x is on S with the boundary condition x — 96/ dn 
prescribed, Equation 1 is a boundary integral equation for 6 and can be solved using 
the boundary element method. 

The geometry S is discretized into a number of elements with given nodes Xj, 
i = 1, . . . , N and interpolation (shape) functions L im This yields the system of 
equations: 

N N 
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Bij — J LjGdS m , 



where the summation over m is taken on elements which contain collocation point 
j, the shape function Lj is the shape function on element m corresponding to point 
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j and S m is the surface of element m. Conceptually, the boundary element method 
consists of discretizing the surface, assembling the matrices A and B, and solving 
the system of Equation 3. In practice, as we shall see below, this is not necessarily 
a feasible approach for large problems, but it is the default method and one which 
can be used as a baseline for assessing other techniques. 

In BEM3D, the integrations in assembling the matrices are performed using quadra- 
ture rules for singular integrands [8], Hayami's transformation for near-singular in- 
tegrals [9] and the symmetric rules of Wandzura and Xiao [10]. The system is solved 
using a library, SISL, based on LAPACK [3], which implements the 'templates' of 
Barrett et. al [11]. The solver library allows for parallel solution of problems us- 
ing the MPI standard and also has a matrix-free option for use with fast-multipole 
methods, as described below, §3. 

2.2 Element types 

Within the code of bem3d, elements are represented as a collection of triangles, 
based on the underlying GTS triangle data type. This raises the question of how to 
link computational information to geometric while retaining the freedom to introduce 
new element types. Previous work using GTS surfaces [12-14] introduced a new data 
type for the vertices which allowed them to have an integer index which could be used 
in assembling the system matrices. This is not a satisfactory solution, however, when 
more general elements are to be implemented. The first reason is that only linear 
triangular elements based on the underlying triangles can be be used; the second is 
that there is no good way to introduce discontinuities at sharp edges, an essential 
part of representing general geometries. 

The solution which has been adopted is shown in Figure 1. Within the library, 
an element is composed of a list of triangles, a list of collocation points with their 
indices, a list of geometric nodes, interpolation functions for surface data and inter- 
polation functions for the geometry. For an isoparametric element, the interpolation 
functions for the geometry are the same as those for the surface data and the geo- 
metric nodes are the same as the collocation points. This approach allows nodes to 
be indexed independently on each element, to support discontinuities, while allowing 
the underlying GTS surface to be geometrically valid. 

Figure 1 shows the triangular elements which have been implemented; quadri- 
lateral elements have also been implemented using the same approach. The shape 
functions employed are polynomials and elements of order zero to three have been 
developed. Figure 1 shows each element with its underlying GTS triangles and the 
collocation points. In the case of a zero order element, there is one collocation point, 
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Figure 1: Triangular element types: zero, first, second and third order: circles show 
computational nodes, solid lines the element boundary and dashed lines the GTS 
triangles. 

placed at the centre of the triangular element. The element is made up of three GTS 
triangles in order to allow for a constant solution across the computational triangle. 
In the case of a linear element, the computational triangle coincides with a GTS tri- 
angle and likewise the collocation points. For the second and third order elements, 
the element does not coincide with GTS triangles due to the element curvature, with 
the underlying triangles being formed by joining the collocation points with straight 
line segments. 

Finally, a mesh is implemented as a GTS surface supplemented with a list of 
elements, a lookup table connecting elements to triangles and a lookup table con- 
necting collocation points to their indices. These lookup tables are implemented as 
hash tables, a well-known method for efficient referencing of data. 
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Figure 2: Renumbering the nodes at the corner of a cube 



2.3 Detection and indexing of sharp edges 

In order to solve problems involving general geometries, a boundary element code 
must have some method of dealing with sharp edges. In the case of zero-order 
elements where collocation points lie on one, and only one, element, there is no 
difficulty, but in the general case where collocation points lie on more than one 
element, it is required that the scheme be able to support a discontinuity in solution 
or boundary condition at a sharp edge. As in previous work, this is implemented 
by giving points of discontinuity multiple indices. This allows for a discontinuity 
as the sharp edge is approached. An example of such indexing, at the corner of a 
cube, is shown in Figure 2. Points are shown with their surface normals and their 
indices, while elements are labelled with capital letters. As an example of a sharp 
edge, nodes on the line between elements B and C each have two indices while those 
between elements B and A have only one: the surface is smooth at these points. The 
exception is the corner of the cube where the node has three indices corresponding 
to the three planes which meet there. 

A method has been developed to automatically detect and index sharp edges and 
corners, an important practical point in applications such as aerodynamics [15] where 
the imposition of an edge condition can be difficult and will ideally be performed 
without user intervention. To index the nodes of a mesh, each element is visited 
in turn. On each element, the currently unindexed nodes are visited. If a node is 
shared with adjoining elements, the surface normal at the node is computed on all 
of the elements. If the angle between normals is less than some specified value, the 
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Figure 3: Structure of bem3d codes: arrows indicate links to libraries. Executables 
are shown boxed and libraries circled. 

surface is taken to be smooth and the node is given one index. Otherwise, it is given 
a different index for each normal which deviates from the reference normal by more 
than the specified tolerance. 

2.4 Code structure 

The structure of the codes in bem3d is shown in Figure 3. The sequence shown 
contains all of the steps which might be needed, but in many problems they will 
not all be necessary. The first is a pre-processing stage to generate the geometry 
and elements and to index the nodes. In most cases, the geometry is generated 
using GMSH [4] and converted to the bem3d format and indexed using a conversion 
program. Boundary conditions are then supplied, either directly, or by computing 
the field and its gradients from a prescribed source. If the problem is being solved 
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by direct solution of a matrix system, the matrices are assembled and stored to be 
used by the solver; if the fast multipole method is used, the information required 
for the matrix multiplications is generated and applied directly in the same code. 
If required, the field quantities can then be computed. Finally, any post-processing 
and visualization which might be required are performed by converting the solutions 
and mesh to the GMSH format. 

The linkages between libraries and codes are shown in Figure 3 with executables 
shown boxed and libraries circled. The libraries which are used are the main bem3d 
library, wmpi, a set of wrapper functions for interfacing to MPI, SISL, a simple 
iterative solver library and GMC, which contains the code for fast and accelerated 
multipole calculations. 



3 ACCELERATED EVALUATION OF INTEGRALS 

Since its introduction, the fast multipole method has been used by a number of work- 
ers to accelerate the boundary element and to reduce its memory requirements [16]. 
The standard approach of direct solution of the matrix system has time and memory 
requirements which scale as iV 2 , which quickly exceed the resources available on even 
the most powerful computers. The method implemented in bem3d allows problems 
to be solved directly on parallel systems using the MPI standard but it was consid- 
ered better to implement a fast multipole method for solution of problems and for 
calculation of radiated fields. The fast multipole method was originally developed for 
point sources, rather than elements of finite extent, and is usually implemented using 
spherical harmonics. In the method implemented in bem3d, the basic algorithm is 
similar to the original adaptive fast multipole method [17], but with two important 
differences. The first is the use of Taylor series in place of spherical harmonics, as 
used by Tausch in his non-adaptive method [18, 19], in order to cater for a wide va- 
riety of Green's functions. The second is a convergence radius criterion for the near 
field, an idea developed [20] to allow for finite size elements which may not be com- 
pletely contained within a box of the tree decomposing the computational surface. 
The rest of this section details the sequence of procedures involved in performing a 
fast multipole solution of the boundary element equations. 



3.1 Hierarchical domain decomposition 

The first step in a fast multipole method is to recursively decompose the domain into 
a tree structure. The tree is made up of boxes aligned with the coordinate axes with 
each box containing up to eight boxes, formed by dividing the main box along each 



8 



Figure 4: Decomposition of a box of points: the parent box is shown with one child 
box indicated in bold 




of the axes, Figure 4. The main box is called a 'parent box' and the boxes formed 
by subdivision are the 'child boxes'. The decomposition of the domain is initialized 
by forming a box which contains all of the vertices of the computational surface S. 
This box is said to be at level zero. The box is then subdivided to form the level 
one boxes, which are themselves divided to form the level two boxes and so on. At 
each stage of division, the number of vertices in a box is checked. If it is less than 
some prescribed number B, division of the box is terminated and the box is called a 
'leaf. Recursive division in this manner gives rise to a tree structure where the box 
at each node of the tree encloses all of the boxes in the nodes below it. 

The process of subdivision is shown in two dimensions in Figure 5. An initial 
distribution of points is enclosed by a level zero box which is divided into four level 
one child boxes, labelled A, B, C and D, Figure 5a. Following the decomposition of 
box C, Figure 5b shows the level two boxes a, b, c and d. In this case, box c has 
fewer than B points and is subdivided no more. Following the subdivision one more 
time, box a is divided into four level three boxes, 1, 2, 3 and 4, of which boxes 2 
and 3 are not divided again. 

Part of the resulting tree structure is shown in Figure 6, with leaf nodes of the 
tree shown boxed and parent nodes shown circled. The parent node R at level has 
four child nodes, corresponding to the four boxes of Figure 5 a and so forth. The tree 
structure can be used to rapidly locate a point and as the basis for fast integration 
methods. 

The final step in generating the tree is to attach the elements of the mesh to the 
leaf nodes. The method adopted here is to list the elements attached to the vertices 
of a leaf box and link this list to the leaf node. 

3.2 Taylor multipole expansions 

A fast multipole method works by replacing a list of elements attached to a box with 
a set of multipole coefficients which can be used to compute the field due to the 
elements in the box far field, where 'far field' will be defined more precisely below. 
This was originally done using an expansion of the field in spherical harmonics, but 
here Taylor series expansions have been used since they allow the unified treatment 
of a range of Green's functions [18, 19], an important point in constructing a general 
code. 

The field integrals to be considered are of the form: 




(4) 
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where /( • ) is a source term and Sb is the surface of the elements inside some box of 
the tree decomposing the surface S. To approximate /(x), we expand the Green's 
function in a Taylor series about some point x c inside the box: 

H 

G ( R ) ^EE (- 1 ) hG mnk (x 1 - x c ) m ( yi - y c ) n { Zl - z c )\ (5a) 

h=0 m,n,k 

111 d h G 



ml n\ k\ dx m dy n dz k 



(5b) 

Xl=X c 



where the summation over m, n and k is taken over all values of m + n + k = h and 
the symmetry relation dG/dxi = —dG/dx has been used. The integral can then be 
rewritten: 

H 

/(X) W E i-^Gmnklntnk, (6) 

h=0 m,n,k 

I m nk = / /(xi)(xi - x c ) m (yi - y c ) n {z x - z c ) k dS b . 
Js b 

The integrations can be performed the interpolation functions on each element so 
that: 



<■ rank 



J2fM mnk) (7) 



where the index i runs over all points on elements connected to the source box and 
w (mnk) . g a we jg^ p rec0 mputed as: 

w (mnk) = I Li{xi _ _ yc)n{zi _ Zc) k ^ (8) 

Js b 

which is independent of the underlying Green's function and so can be stored and 
used in multiple problems. 

To evaluate the field due to the source elements contained in a box, the Taylor 
series can be used in an expansion about a point x^,: 

L 

/(x) « Yl E p ™*( x - x T(y - y'c) n ( z - <) k 

1=0 m,n,k 
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where the expansion coefficients F mn k are computed by differentiation of Equation 6: 

Qq+r+s j ' ' 



dx q dy r dz s 



h=0 m,n,k 

H h 1 

Fqrs ^ ^ ^ ^ ( 1) — j j iGm+^ii+r.fc+s^mnfc) (9) 

/i=0 m,n,k ^ 

which can be used to generate a local expansion about the centre of one box of the 
field due to another box. 

Two basic tools are now required which allow the multipole expansions of a tree 
of boxes to be used to accelerate the evaluation of the field integrals. The first is the 
coefficient shift operator which allows coefficients l mn k evaluated about one centre to 
be shifted to another centre. This allows the coefficients about the centre of a parent 
box to be evaluated from the coefficients of its child boxes rather than having to be 
recomputed. Writing: 

lLnk= [ f(xi)(x 1 -J e ) m (y 1 -iti n (z 1 -J e ) k dS, 
Js 

= [ /(xxXx! - x c + (x c - x' c )) m ( yi -y c + (y c - y' c )) n { Zl - z c + (z c - z' c )) k dS, 
Js 

and applying the binomial theorem: 

k 



i'mnk = E E E (T) (D ^ ~ X ^ iyC - ^ ~ ^h^n-r^ 

q=0 r=0 s=0 \ ^ / \ / \ / 

(10) 



which allows the coefficients I' mnk about centre x' c to be computed in terms of coef- 
ficients lmnk about centre(s) x c . 

The second basic tool is the expansion shift operator which shifts the expansion 
about one centre x c , Equation 9, to give an expansion about another centre x' c . 
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Writing: 

L 

1=0 m,n,k 
L 

= E E Fmnk( - X - x> o + K - x»)Y\y -y'o + (y'o - yo)) n (z - 4 + +(4 - ^o)) fe , 

1=0 m,n,k 
L m n k 

= EE Fmnk E E D* - ^) m " 9 (^o - *o)% - i/or-Vo - wn* - z' ) k ~ s (; 

1=0 m,n,k q=0 r=0 s=0 

(11) 

from which the contribution of the original coefficients F mn k to the shifted coefficients 
F' mnk can be identified. This allows a parent box expansion to shifted to its children. 

3.3 Derivatives of Green's functions 

In order to compute the field due to a set of multipole coefficients, we require the 
derivatives of the Green's function of a problem. Tausch [18] gives a recursive algo- 
rithm which efficiently and stably generates the derivatives of a Green's function up 
to some required order. 

Given a Green's function G(R) which is a function of source-observer distance 
R — \r\, r — (x,y,z), the function: 

(h) 



is defined. The sequence of functions can be computed recursively. For the 
Laplace Green's function, G = l/inR and: 

G (o) (i2) = _L_ G (m-d = J_h+l G ( h ) (Rl 

while for the Helmholtz Green's function, G = exp[jA;i?]/47i\R: 

G ( h+ i) = _?h±± G ( h ) {R) _ ^G^\R), 
with the correction of a typographical error in the original reference. 
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Figure 7: Elements stick out of their boxes 



Given the sequence G^ h \ the derivatives of the functions with respect to the 
components of r can be found from: 

Qm+l+n+k Q{h) Qrn-l+n+k Q(h+l) Qm+n+k Q(h+l) 

— 77^ j ^ (12a) 

dx m+1 dy m dz k dx m ~ l dy n dz k dx m dy m dz k ' 

Qm+n+l+kQ{h) Qrn+n-l+k Q(h+l) Qm+n+k Q(h+l) 

dx m dy n+l dz k ~ U dx m dy n ~ 1 dz k + V dx m dy m dz k ' ^ 12b ^ 

Qm+n+k+lQ(h) Qm+n+k -1q(Ii+1) Qm+n+kQ(h+l) 

— |_ z f 12c) 

dx m dy n dz k+1 dx m dy n dz k ~ 1 dx m dy n dz k ' v ' 

To compute the derivatives of the Green's function up to a given order H, Tausch 
gives the following scheme [18]: 

1. compute G w , h = 0,...,H; 

2. compute d m+n+k G^ / 'dx m dy n dz k , for m + n + k < H - h, h = H - 1, . . . , 0, 
using Equations 12; 

3. set d m+n+k G/dx m dy n dz k = d m+n+k G { ^ /dx m dy n dz k , for m + n + k = 0, . . . , H. 



3.4 Box near field 

The tree decomposing the domain can be used to accelerate the computation of field 
integrals, using the methods described in the next two sections. The basic approach 
is to use the multipole expansion about a box centre to compute the field 'far' from 
the box and to use full integration over the box elements 'close' to the box. This 
requires a definition of 'near' and 'far' which can be used in deciding which procedure 
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Figure 8: 'Inheritance' of child convergence radius rc by parent box with convergence 
radius rp 

to use. In the original version of the fast multipole method [17], the point sources 
used were completely contained in a box and the near field of a box was defined as 
its neighbours. When elements of finite extent are used, they will inevitably stick 
out of the boxes to which they are attached and it is not clear how to define the 
boundary of the 'far field' of the box. 

A number of approaches have been discussed to deal with this problem, some of 
them limited to zero order elements. Tausch [19] defines a separation ratio which 
can be used in deciding when boxes lie in each others' near field. This has the 
disadvantage that it requires the boxes to be at the same level in the tree, i.e. that 
they be the same size. Another approach, employed in a spherical harmonic fast 
multipole code [20], is to use the convergence radius of the expansion about a box 
centre define the box near field. This approach has been used in bem3d, with the 
convergence radius rc given by the maximum distance between the box centre and a 
vertex of an element attached to the box (which may well not lie in the box proper). 
A point lies in the far field of the box if its distance to the box centre is greater than 
r c . Two boxes, which need not be the same size, lie in each others' far fields if the 
distance between their centres is greater than the sum of their convergence radii. 

Finally, the convergence radius rp of a parent box can be derived from the radii 
of its child boxes using the approach shown in Figure 8: 

r P = r c + d c /2, (13) 

where rc is the maximum of the convergence radii of the child boxes and dc is the 
length of a child box's diagonal. 
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3.5 Accelerated field evaluation 

The first principal application of the basic techniques of the previous sections is in 
accelerated evaluation of the field due to a known source distribution over a surface. 
Given the tree decomposition of Section 3.1, the multipole moments computed at 
each leaf node and shifted upwards in the tree, Section 3.2 and a field point x, the 
field may be evaluated by traversing the tree and computing the contribution from 
each box of the tree in turn, descending the tree only far enough to evaluate the 
contribution of a branch to sufficient accuracy. 
The algorithm may be summarized as follows: 

1. set /(x) = 0; 

2. set the current box to the root of the tree at level 0; 

3. for the current box: 

(a) if the distance to the box centre |x — x c | > rc, evaluate the multipole 
expansion of Equation 9 and add to /(x); terminate descent of this branch; 

(b) if the distance to the box centre |x — x c | < r c and the current box is a 
parent, repeat step 3 for each child box; 

(c) if the distance to the box centre |x — x c | < r c and the current box is a 
leaf node, evaluate the field by direct integration over the elements of the 
box and add to /(x); terminate descent of this branch. 

This algorithm can be used to evaluate the field at a small number of points which 
may be far from the surface S: it is efficient and avoids the setup costs involved 
in the fast multipole method proper. When the field must be evaluated at a large 
number of points, for matrix multiplication, say, the fast multipole method described 
in the next section is used. 

3.6 Fast matrix multiplication 

In order to further accelerate the computation of integrals in the boundary element 
method, the technique of the previous section can be extended to a true fast mul- 
tipole method, at the expense of some extra pre-processing. This yields a method 
which speeds up the matrix multiplications required for the solution of the boundary 
element equations by using local expansions of the field about the centre of each leaf 
node of the tree. 
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Figure 9: Interaction and near field lists for a tree box 

The algorithm is similar to that of the previous section but with some extra pro- 
cedures related to the evaluation of the near field terms and the separation of boxes 
into near and far field. The first step is, as before, to generate the tree decomposing 
the surface, including the multipole coefficients and convergence radii for the boxes. 
Each box now has linked to it two lists: the near-field list and the interaction list. 
The near-field list consists of boxes whose contribution to the field in the box must 
be computed by direct integration over the attached elements. The interaction list 
is made up of boxes whose contribution can be computed using the multipole expan- 
sion. Figure 9 shows an example. The light grey box in the middle of the grid is the 
box whose lists are shown. The darker boxes are those in the near-field list, using 
the criterion of Section 3.4 which states that boxes are well-separated iff: 

|x«-x( 2 )|>rg) + ^, (14) 

where x.^ and are the centre and radius of convergence of each of the boxes. 
The darkest boxes are those on the interaction list whose contribution to the field 
in the main box can be computed using multipole expansions. The field in the box 
is computed as a sum of contributions from the near field and interaction list boxes 
and from the local expansion about the centre of the parent box which is computed 
in the same way. In Figure 9, the box labelled 1 is in the near field list, even though 
it is larger than the main box, because it is a leaf node and must contribute directly 
to the field. The box labelled 2 is in the near field list, even though it does not touch 
the main box, because it does not meet the separation criterion of Equation 14. 

The near-field and interaction lists can be generated using the following algorithm, 
applied to each box B descending the tree starting at level zero: 

1. initialize the interaction and near-field lists of B to be empty; 

2. traverse the near field list of the box's parent P and for each box N in the list: 
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(a) if N is a leaf node: 

i. if N and B are well-separated, add N to the interaction list of B; 

ii. if N and B are not well-separated, add TV to the near-field list of B; 

(b) if N is not a leaf node, traverse the child boxes C of N and: 

i. if C and B are well-separated, add C to the interaction list of B; 

ii. if C and B are not well-separated, add C to the near-field list of B; 

Given a tree with multipole coefficient weights w^ mnk ^ up to order m + n + k = if, 
and near-field and interaction lists for each box, a matrix multiplication can be 
performed as follows: 

1. for each leaf box, compute the multipole moments I mn k = ^ fi w f' mk \ 171 + 
n + k <H; 

2. traverse tree from bottom to top, calculating box moments by accumulating 
the contribution from child boxes using Equation 10; 

3. traverse tree from top to bottom, computing local expansions to order L: 

(a) shift parent local expansion to box centre using Equation 11; 

(b) add local expansion terms due to boxes in interaction list; 

each leaf box now has an expansion about its centre which gives the field in 
the box due to all other boxes except those in the near-field list; 

4. for each leaf box, evaluate the local expansion at each enclosed vertex and add 
the contribution from boxes in the near-field list. 

In practice, the near-field contribution is precomputed and stored as a sparse matrix 
for re-use in the solution procedure and the expansion order L is a function of depth 
in the tree [19] in order to avoid excessively high order expansions in small boxes. 
The order Li at level / is set to Li = min(L max , L min + Z max — I). 

4 NUMERICAL TESTS 

The performance of the boundary element codes has been assessed in terms of accu- 
racy, speed and memory. The test uses a point source placed inside the surface to 
generate a potential and a boundary condition (potential gradient). The system is 
solved for the boundary condition which should recover the original surface potential 
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due to the point source. The error measure is the rms difference between the original 
and computed potential. Two geometries have been used, a unit sphere and a cube of 
edge length 2 centred on the origin. The sphere is generated by successive refinement 
of an initial surface, allowing tests to be carried out on a smooth surface with control 
over the number of vertices. The cube is generated using GMSH [4] with successively 
smaller edge lengths. This tests the ability of bem3d to deal with sharp edges using 
the methods outlined earlier. Tests have been conducted with first and second order 
elements and the Laplace and Helmholtz (k = 2) equations have been solved. The 
fast multipole method was used to solve problems of all sizes and the direct matrix 
method was used for the smaller problems which would fit in memory. Calculations 
were performed on a 3GHz Pentium 4 with 1Gb of memory. 

Data reported are the number of vertices, the maximum number of vertices B in 
a tree box, the setup time and solution time and the maximum memory used during 
solution. For matrix solution, the setup time is the time required to assemble the 
system matrices. For the fast multipole method, the setup time is that needed to 
generate the tree for the mesh and compute and store the near-field matrix. In neither 
case is the time needed for disk storage and recovery included. In the fast multipole 
case, the time reported is the total setup time for one problem. In practice, because 
the multipole coefficient weights are independent of the problem being solved, the 
setup time for multipole problems could be reduced by re-using the weights. In 
both cases, the solution time reported is that needed for solution using the stabilized 
biconjugate gradient method [11]. 

As a reference case, the fast multipole solver uses a minimum expansion order 
of 3 and a maximum of 8. For larger problems, the error ceases to reduce with 
element size. This appears to be due to the multipole precision being larger than 
the discretization error. As a check, a number of the larger problems were solved 
with the minimum expansion order increased to 4. These results are included in the 
tabulated data and indicated by an asterisk. 
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Table 1: Code performance for Laplace equation on sphere using first order elements 



FMM Direct 



TV 


B 




e 


^setup/s 


t/s 


Mb 




e 


^setup/s 


t/s 


Mb 


42 


8 


3.03 x 10" 


-2 


1 





2 


5.31 x 10- 


-3 


1 








162 


16 


1.55 x lO - 


-2 


6 


3 


5 


1.38 x 10" 


-3 


5 





2 


642 


32 


3.48 x 10- 


-4 


40 


7 


10 


3.46 x 10- 


-4 


80 





8 


2562 


64 


9.03 x 10" 


-5 


473 


8 


49 


8.62 x 10- 


-5 


1322 





104 


2562* 


64 


8.59 x 10- 


-5 


491 


11 


49 












10242 


128 


7.32 x 10- 


-5 


1758 


54 


203 












10242* 


128 


2.96 x 10" 


-5 


1897 


70 


203 













Table 2: Code performance for Laplace equation on sphere using second order ele- 
ments 

FMM Direct 



N 


B 




e 


^setup/s 


t/s 


Mb 




e 


^setup/ S 


t/s 


Mb 


162 


8 


2.04 x 10" 


-4 


5 


3 


5 


2.03 x 10- 


-4 


3 





2 


642 


16 


2.55 x 10- 


-5 


25 


7 


11 


2.20 x 10- 


-5 


32 





8 


2562 


32 


3.33 x 10" 


-5 


102 


47 


37 


2.32 x 10- 


-6 


474 





104 


10242 


64 


4.82 x 10" 


-5 


749 


53 


193 












40962 


128 


8.12 x 10' 


-5 


3611 


264 


842 
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Table 3: Code performance for Helmholtz equation on sphere using first order ele- 
ments 



FMM Direct 



N 


B 




e 


^setup/ S 


t/s 


Mb 




e 


^setup/s 


t/s 


Mb 


42 


8 


3.46 x 10" 


-•2 


1 





2 


5.14 x 10" 


-3 


1 





430 


162 


16 


1.45 x 10" 


-2 


10 


20 


6 


1.28 x 10" 


-3 


6 





3 


642 


32 


3.30 x 10" 


-4 


68 


35 


12 


3.15 x 10" 


-4 


118 





15 


2562 


64 


1.18 x 10" 


-4 


843 


40 


70 


7.73 x 10" 


-5 


1758 


1 


207 


2562* 


64 


7.75 x 10" 


-5 


1398 


25 


122 












10242 


128 


1.83 x 10" 


-4 


3020 


284 


284 












10242* 


128 


2.06 x 10" 


-5 


4721 


231 


475 













Table 4: Code performance for Helmholtz equation on sphere using second order 
elements 

FMM Direct 



N 


B 




e 


^setup/ S 


t/s 


Mb 




e 


^setup/s 


t/s 


Mb 


162 


16 


1.71 x 10" 


-•2 


5 


9 


6 


3.62 x 10" 


-4 


3 








642 


32 


8.00 x 10" 


-5 


31 


23 


14 


4.30 x 10" 


-5 


39 





15 


2562 


64 


9.64 x 10" 


-5 


247 


25 


74 


4.64 x 10" 


-6 


583 


1 


206 


2562* 


64 


7.75 x 10" 


-5 


1398 


25 


122 












10242 


128 


1.38 x 10" 


-4 


888 


165 


282 












10242* 


128 


2.06 x 10" 


-5 


4721 


231 


475 
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Table 5: Code performance for Laplace equation on cube using first order elements 



FMM Direct 



N 


B 




e 


^setup/s 


t/s 


Mb 




e 


^setup/ S 


t/s 


Mb 


117 


5 


2.22 x 10" 


-3 


7 


9 


5 


2.22 x 10" 


-3 


4 








431 


10 


4.85 x 10" 


-4 


21 


21 


9 


4.87 x 10" 


-4 


30 





5 


1763 


20 


1.17 x 10" 


-4 


114 


81 


29 


1.19 x 10" 


-4 


552 





50 


7281 


40 


2.55 x 10" 


-5 


618 


124 


98 












28029 


80 


2.70 x 10" 


-5 


5003 


227 


512 












28029* 


80 


7.85 x 10" 


-6 


17691 


391 


854 













Table 6: Code performance for Laplace equation on cube using second order elements 



FMM Direct 



N 


B 




e 


^setup/ S 


t/s 


Mb 




e 


^setup/s 


t/s 


Mb 


378 


10 


1.25 x 10" 


-4 


27 


47 


8 


1.24 x 10" 


-4 


10 





4 


1538 


20 


1.71 x 10" 


-5 


69 


69 


27 


5.61 x 10" 


-6 


154 





39 


1538* 


20 


6.90 x 10" 


-6 


73 


94 


27 












6674 


40 


9.10 x 10" 


-6 


361 


147 


106 


2.78 x 10" 


-7 


2982 


1 


695 


6674* 


40 


5.45 x 10" 


-7 


614 


279 


147 












28362 


80 


2.09 x 10" 


-5 


2551 


253 


636 
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Table 7: Code performance for Helmholtz equation on cube using first order elements 



FMM Direct 



TV 


B 




e 


^setup/s 


t/s 


Mb 




e 


^setup/s 


t/s 


Mb 


117 


5 


3.46 x 10' 


-3 


8 


67 


5 


3.47 x 10- 


-3 


5 





2 


431 


10 


7.34 x 10' 


-4 


46 


158 


34 


7.43 x 10- 


-4 


37 





8 


1763 


20 


1.69 x 10- 


-4 


269 


489 


36 


1.68 x 10- 


-4 


657 


1 


98 


7281 


40 


5.11 x 10- 


-5 


854 


408 


128 












28029 


80 


7.18 x 10- 


-5 


6246 


1142 


713 













Table 8: Code performance for Helmholtz equation on cube using second order ele- 
ments 

FMM Direct 



TV 


B 




e 


^setup/s 


t/s 


Mb 


e 


^setup/s t/s 


Mb 


378 


10 


2.49 x 10- 


-4 


31 


168 


10 


2.45 x 10" 4 


12 


6 


1538 


20 


5.73 x 10- 


-5 


81 


313 


34 


9.12 x 10~ 6 


189 1 


76 


1538* 


20 


1.66 x 10- 


-5 


87 


421 


34 








6674 


40 


2.51 x 10- 


-5 


431 


1149 


147 








6674* 


40 


1.35 x 10" 


-6 


775 


1672 


205 








28362 


80 


5.36 x 10- 


-5 


3009 


901 


892 
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The first data presented are for the Laplace equation solved on a sphere, Tables 1 
and 2. The number of points increases by a factor of about four between test cases 
and the computing demands increase at the expected rate, with the memory require- 
ment scaling roughly as N 2 for the matrix solution and roughly linearly for the fast 
multipole method. This is also true for the setup time but not for the solution time, 
which for the matrix method is less than one second in all cases while it increases 
roughly linearly for the fast multipole method. For the smaller problems N < 2562, 
the error is essentially the same for both solution methods but beyond this point the 
error from the fast multipole method stops falling. This behaviour can be cured by 
increasing the minimum expansion order as shown by the test cases marked with an 
asterisk in Table 1 where the reduction in error with N has been restored. It can also 
be noted that, even without re-using multipole weights, the setup time for the fast 
multipole method is much less than that for the direct solver, far outweighing any 
advantage in solution time for the direct method, even for relatively small problems. 

The solvers behave similarly when applied to the Helmholtz problem. The decline 
in error with N is similar, with a need to increase the minimum expansion order at 
large N > 2562. 

The second test case for the Laplace and Helmholtz solvers is that of a cube 
of edge length two, used to check the performance of the solver on a geometry 
with sharp edges, a type of problem known to be quite ill-conditioned. In the case 
of the Laplace equation, Tables 5 and 6, the fast multipole solver is comparable 
in accuracy to the direct method, with the caveat that the minimum expansion 
order must be increased for the larger problems, but is far superior in time and 
memory consumption. Likewise, the results for the Helmholtz equation show the 
expected reduction in error with vertex number N and a much lower setup time for 
the fast multipole method, although the solution time is rather large for small N 
using second order elements. This appears to be due to the inherent ill-conditioning 
of the problem and because the ratio of edge vertices to non-edge vertices is greater 
for smaller problems. No attempt was made to optimize the mesh to improve the 
handling of sharp edges, as in other studies [19], since it is intended that bem3d be 
able to handle unstructured meshes produced by a standard mesh generator, without 
requiring intervention from the user. 

5 CONCLUSIONS 

A new library has been developed for the boundary element solution of general prob- 
lems. The library is free software released under the GNU General Public Licence. It 
incorporates a new adaptive fast multipole method for boundary element problems 
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which gives comparable accuracy to direct matrix solution at a much lower cost in 
terms of time and memory, making possible the solution of problems which would 
otherwise be too large to solve with the available resources. It includes higher order 
elements and has provision for the addition of user-defined elements and Green's 
functions without extensive recoding. 

Development of BEM3D continues with the intention of adding further element 
types and a parallel implementation of the fast multipole method. 
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